Bootstrapping Distantly Supervised IE Using Joint Learning and Small Well-Structured Corpora
نویسندگان
چکیده
We propose a framework to improve the performance of distantly-supervised relation extraction, by jointly learning to solve two related tasks: concept-instance extraction and relation extraction. We further extend this framework to make a novel use of document structure: in some small, wellstructured corpora, sections can be identified that correspond to relation arguments, and distantly-labeled examples from such sections tend to have good precision. Using these as seeds we extract additional relation examples by applying label propagation on a graph composed of noisy examples extracted from a large unstructured testing corpus. Combined with the soft constraint that concept examples should have the same type as the second argument of the relation, we get significant improvements over several state-of-the-art approaches to distantly-supervised relation extraction, and reasonable extraction performance even with very small set of
منابع مشابه
Semi-supervised Bootstrapping of Relation Triples from the Web, Query Languages over these Noisy Triples, their Semantics, and Query Execution Systems
Information Extraction (IE) is the process of retrieving structured information from unstructured text. IE has traditionally relied on extended human interposition to extract small set of predefined relations from the corpus. Now with Web coming in to picture, methods and goals of IE have taken a slight detour, with increasing focus on following challenges 1. Domain independent/Open Information...
متن کاملBootstrapping Chatbots for Novel Domains
We tackle the problem of automatically generating chatbots from Web API specifications using embedded natural language metadata, focusing on the intent classification subtask. One of the main challenges for such a use case comes from the lack of a sufficiently representative training sample for utterance classification, which hinders the traditional supervised model’s ability to generalize to u...
متن کاملDistant IE by Bootstrapping Using Lists and Document Structure
Distant labeling for information extraction (IE) suffers from noisy training data. We describe a way of reducing the noise associated with distant IE by identifying coupling constraints between potential instance labels. As one example of coupling, items in a list are likely to have the same label. A second example of coupling comes from analysis of document structure: in some corpora, sections...
متن کاملFiltered Ranking for Bootstrapping in Event Extraction
Several researchers have proposed semi-supervised learning methods for adapting event extraction systems to new event types. This paper investigates two kinds of bootstrapping methods used for event extraction: the document-centric and similarity-centric approaches, and proposes a filtered ranking method that combines the advantages of the two. We use a range of extraction tasks to compare the ...
متن کاملSemantic Role Labeling
This tutorial will describe semantic role labeling, the assignment of semantic roles to eventuality participants in an attempt to approximate a semantic representation of an utterance. The linguistic background and motivation for the definition of semantic roles will be presented, as well as the basic approach to semantic role annotation of large amounts of corpora. Recent extensions to this ap...
متن کامل